Target transfer Q-learning and its convergence analysis

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fastest Convergence for Q-learning

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...

متن کامل

Convergence of Optimistic and Incremental Q-Learning

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...

متن کامل

The Asymptotic Convergence-Rate of Q-learning

In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to conv...

متن کامل

On Convergence of q-Homotopy Analysis Method

The convergence of qhomotopy analysis method (q-HAM) is studied in the present paper. It is proven that under certain conditions the solution of the equation: 1 ∅ , ∅ , 0 associated with the original problem exists as a power series in .So,under a special constraint the q-homotopy analysis method does converge to the exact solution of nonlinear problems. An error estimate is also provided. The ...

متن کامل

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a nite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for o -line value ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2020

ISSN: 0925-2312

DOI: 10.1016/j.neucom.2020.02.117